Mood and Text Analyses

Descriptive statistics

Study description

The dataset we’re looking at today is a relatively unique one. So far, we have collected data from about 65 adolescent participants (with more coming in), who have either recently attempted suicide (Suicide Attempt), reported having suicidal ideation (Suicidal Ideation), or are part of a psychiatric control group (Psychiatric Control). We then put our in-house Effortless Assesment of Risk States app (EARS) onto their phones (both Android and iPhone), which collects a wide range of data from the sensors in the phone. One fairly unique feature of our app is that we have a keyboard logger that logs everything typed by the user. In addition to the passive mobile sensing, individuals will also, daily, get an Ecological Momentary Assessment (EMA) that asks them to rate their mood on a scale of 1-100.

In this document, we’re going to first explore the daily mood score, and then we’re going to see how daily mood is related to various features we can extract from the text data we’ve gathered from the adolescent’s phone.

Daily mood score distribution

Let’s go ahead and take a look at the distribution of what we’re trying to predict: daily reported mood.

With this graph, each individual has their own daily mood score distribution (as indicated by color). This is a density plot, so wherever the density goes up means that there’s more data points within that range. For instance, the participant with the distribution in red that’s closest seems to have some major spikes at 60, 80, 90, and 100. That means there’s more mood ratings there than at other points, such as 75. Lookng at all the participants, it looks like there’s quite a bit of variability in reported mood from person to person. However, it might be useful to break it down even further and look at whether there are any differencs in distributions based on psychiatric group.

From what I can tell, it looks like there’s quite a bit of difference when looking at the psychiatric conrol vs the suicide attempt and suicide ideation group, but probobaly not much of a difference between suicide attempt and suicidal ideation. Let’s run a quick analysis to see if that’s the case. We’ll use the dabestr package, which is helpful for visualizing effects between different groups.

Alright, it looks as though my hunch was somewhat correct. This graph depicts in the top half all of the data points within the different groups. On the bottom half, the center line displays the mean daily mood of the psychiatric control as 0 (think of it as a baseline). The two sideways distributions are simply a measure of whether or not the mean is statistically significantly different from the psychiatric control. If the vertical black line for one of the groups crosses over the horizontal zero line (like the suicide ideation group), the difference in means is not statistically different. Daily reported mood in the suicide attempt group is significantly different from the psychiatric control group, but the suicidal ideation group is not siginficantly different from the psychiatric control group. Additionally, as anticipated, there is no significant difference between the suicidal ideation and suicide attempt group.

Now, these differences are rather interesting, but I’m curious in something a little bit different. For that, we’ll have to get everyone on the same playing field. There still seems to be a wide range of how people rate their mood on any given day–in essence, it’s possible that one person’s 75 could be another person’s 50. It might be more interesting to look, instead of raw mood score, at deviations from an individual’s average daily mood score. In order to do this, we’ll take each person’s average score and subtract each their scores from it, in order to find deviations from their mean.

Standardizing the deviations

As you can see, it looks like some people report wide shifts in their mood, whereas other people are pretty narrow in the way they report their mood. It seems as though this still doesn’t take into account that one indiviual’s shift in mood might seem bigger than another individual’s, but in reality they may be the same. It would be helpful if we could standardize shifts in mood so that everyone’s deviations was on the same scale. In order to do this, we’ll do something called a z-score transform on each individual deviations. This will standardize each individual’s deviations from their own mean so that each person’s deviations will now be comparable to one another. When we do this, what does it finally look like?

By Individual

That’s more like it! Based on the first graph, we can now see that, while there is still some variability in mood deviations, it looks as though the deviations are pretty comparable across individuals. In the second graph, I’ve broken the data further down into study group, and we can see that the deviations are also pretty comparable across individuals and group. Now that each individual is comparable to one another, and the groups are comparable to one another, we can try and do some cool new things with the data without worrying about group or individual effects (don’t worry, we’ll add those back into later analyses). Hop up to the top of the page and click on the next tab “Text Analysis” to see the next set of analyses.

By Individual and Group

That’s more like it! Based on the first graph, we can now see that, while there is still some variability in mood deviations, it looks as though these are pretty comparable across individuals. In the second graph, I’ve broken the data further down into study group, and we can see that the deviations are also pretty comparable across individuals and group. Now that each individual is comparable to one another, and the groups are comparable to one another, we can try and do some cool new things with the data without worrying about group or individual effects (don’t worry, we’ll add those back into later analyses). Hop up to the top of the page and click on the next tab “Text Analysis” to see the next set of analyses.

Text Analysis

So far, we’ve only looked at the EMA daily mood report data. Now we’re going to add in the text data from the EARS keyboard logger. Let’s take a look at both the text and the daily mood report. In the figure below, you can see each participant’s first day by the black dot on the left-hand side. All the way to their last day in the study by the black dots on the right-hand side. The blue dots indicate days in which we have data only from the keyboard. The gray dots indicate days when we only have data from the daily mood EMA. The orange dot is days in which we have data from both the keyboard and the daily mood EMA. For our analyses, we’ll only be looking at data from the orange dots: on days when we have both text and EMA data. It should be noted that preliminary analyses did show that there were differences in mood on days when there were no text data and days when there is text data, but we will leave the more thorough investigation of this missing data for another analysis.

With this information, we can break the data into two groups: data from positive mood days and data from negative mood days. From that, we can see how people’s texts on positive days are different from their texts on negative days, allowing us to have better insight into adolescent texting behavior and give us some better direction on where our future research can go.

Word clouds

Now that we’ve loaded the data, let’s go ahead and play around with it a little bit. We’re going to split the dataset up based on the standardized scores we just made with daily reported mood. We’ll break the dataset up into 2 groups: any text that was made on days where an individual’s mood was two stardard deviations above their mean (let’s call these positive days), and any text that was made on days where an individual’s mood was two standard deviation below their mean (let’s call these negative days). Now we can look at how texting patterns differ on positive and negative days. Let’s look at the most common words used on days when individuals report high overall mood.

Okay! It looks like we have words like “good,” “baby” and others that are good. This is a good sign so far. I wonder if the negative word cloud will be really different. Let’s check out what words are being used on days in which individuals report low daily mood.

Huh, well I think what’s interesting here is that there doesn’t seem to be much difference between the words being used on positive days and the words being used on negative days. You can still see words like “good” and “thank.” So it looks like simply breaking up word usage into positive days and negative days isn’t going to be enough for us to notice differences. We might have to think about how we can extract some signal from this noise.

As an aside, if it wasn’t clear, the population we’re collecting data from are high school students, and that is pretty well reflected in some of the more colorful word choices shown here.

Wourd counts

word negative positive total
elderly 3 0 3
ended 1 2 3
ew 3 0 3
exa 3 0 3
expensive 1 2 3
explain 2 1 3
extremely 1 2 3
fell 2 1 3
figured 3 0 3
fill 2 1 3
fishing 0 3 3

Now we’ve got a dataset of all of the words being typed. In addition, we also have information on how many times that any given word was used on positive days and on negative days. For example, we can see that ‘ew’ is used 3 times in our dataset, and all 3 uses are on negative mood days.

It might be good to get a feel for what are the most common used words on negative days and what are the most common used words on positive days.

While there are some unique words on the list, we can see that there are similar words on both lists, such as “like” and “wanna.” This tells us that these words are just words that are comomnly used, regardless of whether or not it’s a good or bad day.Now that we have a good sense of the words that are representative on positive and negative days, let’s take a look at the relationship between word usage on both of those days.

While it’s clear that there’s a positive relationship here, it’s hard to really tell because there are a lot of data points scrunched up in the lowe left quadrannt. Let’s do a quick and dirty log transformation of the data to get these points closer together. Not only will it make more visual sense, but there will be more space between the words. This means we can deploy a dynamic tool that allows us to display the word associated with a data point whenever we hover over it. Go ahead and put your cursor over the data points!

From a cursory glance, it looks like there is a positive relationship between positive words and negative words. This is indicating that even though there are words that are used a lot on negative days (such as ‘like’), they are also used a lot on positive days as well. That’s not really helpful for us if we’re wanting to find different word usages depending on daily mood. We’re more interested in the data points that would lie further away from a diagonal line going up and to the right. Let’s see if we can classify some of the words that appear more frequently on positive or negative days (as opposed to words that appear on both positive and negative days.

First, let’s simply get a measure for whether or not a word occurs on a positive day vs any day.

word negative positive total pos_rate
zumiez 0 6 6 1.0000000
yoo 0 3 3 1.0000000
yesssss 0 3 3 1.0000000
wet 0 8 8 1.0000000
welp 0 4 4 1.0000000
sooo 1 2 3 0.6666667
smile 2 4 6 0.6666667
slowly 2 4 6 0.6666667
silly 1 2 3 0.6666667
shitty 1 2 3 0.6666667
seriously 2 4 6 0.6666667

As we can see here, this is essentially showing us which words are most likely to be used on positive days. Essentially, based on this (very tentatively based), if we see the word zumiez, we can assume it is a positive day for that individual (again, let’s try too hard not to extrapolate out based on 6 uses of the word, but it’s a start).

But now we might be intersted in a similar metric but something slightly different. Before, we were looking at the likelihood (not actual likelihood, mind you) in which a word would appear on a positive day. But now let’s look at, on a positive day, what words are we most likely to see. Again, similar, but not necessarily the same.

Here we see a new list of words appear! Now we have words like “like,” “baby,” and “love.” The only problem is that words that are most likely to be found on positive days are not necessarily words that are only used on positive days. How can we reconcile these two things and potentially bring out some words that are both unique to positive days and likely to occur on positive days? One way we can do that is by using the harmonic mean. The harmonic mean is useful for this situation because the mean can be heavily influenced by outliers and larger numbers, while the harmonic mean penalizes outliers and can allow the transformation to be more sensitive to least elements within a list.

word negative positive total pos_rate pos_freq_pct pos_hmean
like 148 95 243 0.3909465 0.0169916 0.0325677
good 69 91 160 0.5687500 0.0162762 0.0316467
oh 87 71 158 0.4493671 0.0126990 0.0246999
get 72 68 140 0.4857143 0.0121624 0.0237306
want 65 65 130 0.5000000 0.0116258 0.0227233
baby 16 64 80 0.8000000 0.0114470 0.0225710
wanna 48 62 110 0.5636364 0.0110893 0.0217506
would 38 60 98 0.6122449 0.0107315 0.0210933
love 116 60 176 0.3409091 0.0107315 0.0208080
yes 34 54 88 0.6136364 0.0096584 0.0190174

Unfortunately, it looks as though the harmonic mean didn’t produce much meaningfully different rankings of the data, so it doesn’t really add much to our understanding of the data. Another way of linking these two metrics is by using the cumulative distribution function (CDF). This wil tell us what the probability of a word occurring is. Essentially tell us the probability where the value of a word lies in the distributions for both the positive rate of the word and the positive frequency percentage of the word. This will put both terms onto a more similar scale, making comparing them (by taking the harmonic mean of the CDF values) significantly easier.

word negative positive total pos_rate pos_freq_pct pos_normcdf_hmean
dick 2 34 36 0.9444444 0.0060812 0.9829627
cum 0 21 21 1.0000000 0.0037560 0.9694369
ii 1 22 23 0.9565217 0.0039349 0.9694150
bet 3 24 27 0.8888889 0.0042926 0.9656588
hbu 2 21 23 0.9130435 0.0037560 0.9592591
cause 7 31 38 0.8157895 0.0055446 0.9566946
baby 16 64 80 0.8000000 0.0114470 0.9530434
wyd 6 24 30 0.8000000 0.0042926 0.9444997
awe 2 18 20 0.9000000 0.0032195 0.9380321
ass 7 24 31 0.7741935 0.0042926 0.9362006

Well, as you are probably fairly aware, this is a very new list of words. It appears that the uniquely positive words that teenagers tend to use on positive days tend to be more sexual and romantic in nature. While it may be uncomfortable, it’s still important information for us to be aware of, and it’s not altogether unsurprising that individuals might feel better on days in which they are sexually active, or vice versa. From this table, we can now see the words that are most likely to show up on postive days and be specific to only positive days (pos_normcdf_hmean). Essentially, when there is a word with a high score on this, if we see this word being used, there’s a high chance that it’s a positive day, and if it’s a positive day, there’s a high probability that someone is using this word.

Let’s go ahead and run the same analyses for words on negative days.

word negative positive total pos_rate pos_freq_pct neg_normcdf_hmean
sorry 82 8 90 0.0888889 0.0014309 0.9284597
ok 147 25 172 0.1453488 0.0044715 0.9010648
lol 130 24 154 0.1558442 0.0042926 0.8952027
something 27 3 30 0.1000000 0.0005366 0.8923124
sage 29 4 33 0.1212121 0.0007154 0.8910254
yeah 134 28 162 0.1728395 0.0050080 0.8851659
ezoo 21 0 21 0.0000000 0.0000000 0.8829399
think 60 16 76 0.2105263 0.0028617 0.8603949
anything 23 4 27 0.1481481 0.0007154 0.8474445
cuz 32 9 41 0.2195122 0.0016097 0.8418649

Interestingly, we get words like “sorry” and “ok.” While the former might be obvious, it’s not unbelievable that individuals who write “ok” might be having a low mood day–especially given that it is a cultural meme that using “ok” in a text can be seen as standoffish or negative. Now that we’ve been able to pull some of the signal from the noise, let’s see if the word clouds are a little more meaninguf. Let’s start with the positive word cloud.

Great, we can already start to see some differences, like “good” and “thank.” Alright, let’s look at the negative word cloud now.

Well that’s more like it! There’s still some noise in there, but there are definitely things that we can pick out that make sense. In the positive word cloud we can see words like “good,” “happy,” and “want.” In the negative word cloud we see things like “sorry,” “bad,” “shit.” Now that we have this, let’s see if there’s another way that we can look at the relationship between the two types of data (positive and negative). I’ll create a scatter plot based on the positive and negative CDF harmonic means so we can see where the probability of each word falls in relationship to bad days and good days.

While the shape of the plot is relatively interesting, I think what’s more important are the values in the upper left quadrant and the lower right quadrant. Remeber, words in the upper left hand quadrant are the words that are most likely likely to happen on positive days and, when those words do show up, they’re likely to only occur on positive days. conversely, words in the lower right quadrant are the words that are most likely likely to happen on negative days and, when those words do show up, they’re likely to only occur on negative days.

If you hover your cursor over the points, you can see which word that point corresponds to. There are plenty of random words in here (“ezoo”), but there are clearly some interesting ones. Some positive words are “baby” and “sweetheart,” while some interesting negative words are “depressed” and “drunk.” Now that we’ve gotten a good look at how we can find differences in the data based on positive vs. negative days, it will be interesting if we can find associations in the data.

Mixed-effects modeling

Mixed-effects modeling are certain types of statistical tests that can allow us to account for data that might be impacted by repeatedly measuring from the same person or group. For example, let’s say we have person A, who is considered a night owl. If person A is asked what time they went to sleep, there might be some variability in their answer, but it’s probably consistently late. In contrast, person B is a morning person, so they would go to bed much earlier. If you asked them what their bedtime was over the past week, it would probably be pretty early, again with some potential variability.

If we were to try to understand how sleeping affected a certain outcome, maybe job performance, we would want to look at a few things. For instance, person A goes to sleep, on average, much later than person B. This is called between-subjects effects, because you’re making comparisons between individuals, and might be considered a trait-like factor. So, for this hypothetical, we might say that people who go to sleep later are more likely to have poorer job performance. However, we might also be interested in how changes in sleeping routine might impact their daily job performance. Perhaps, if an individual goes to bed later than they normally would (regardless of what their normal bedtime is), they will have a worse day at work the next day. The variability that an individual shows around their normal bedtime (regardless of if they’re a morning or an evening person), would be considered their within-subjects effect, and would be considered a state-like factor. Normal regression techniques do not take these different effects into account, so we will have to incorporate mixed-effects models in order to understand individual differences and the between- and within-subjects effects.

For these analyses, we will be examining how features within the text data (extracted and created from a separate pre-processing script) might be useful for predicting daily reported mood via EMA surveys collected with the EARS app. For the following analyses, we only included text that was entered into keyboards while using social communication apps (e.g., SMS, Snapchat, Discord), and removed text entered into keyboards while using apps that were not considered social communication apps (e.g., Chrome, Podcast, Google Search). It’s important to note that we have chosen to extract the text featrues from text entered 24 hours prior to their daily reported mood. Essentially, we are trying to see if these text features can predict subsequent daily mood. Another way that our approach differs from others is that we purposely did not remove stop words (e.g., “yeah”, “okay”) from our analysis, as they may be important parts of speech used in communication and removing them did not make sense for our specific analysis.

The three features we extracted from the data were as follows: daily word count (n), daily sentiment score, and daily First Person Pronoun (FPP) usage. I will show mixed-effects models for each of the three features that take into account individual differences. I will start out by using an aggregated model (no differentiation of between- and within-subjects effects). I will then show the between-subject effects and then show the within-subjects effects.

Daily Word Count

Non-aggregated model

While these tables may look intimidating, the important parts to pay attention to are the “\(std. Beta\)” column and the “\(p\)” column. The \(std. Beta\) column will tell us how big the effect is (higher is better), and the \(p\) column will tell us if the effect is significant. Essentially an p value of less than 0.05 is considered significant. A positive effect means that as one value goes up (e.g., Word Count), so does the other (e.g., Daily Mood Score). A negative effect means that as one value goes up, the other goes down. (Side note: don’t worry about the “Interecept” part of the table–it’s realtively meaningless, at least for our purposes).

For the non-aggregated model, we can see, based on the \(std. Beta\) and it’s associated \(p\) value, that there is no significant relationship between daily word count and daily reported mood (because \(p\) is greater than 0.05). In essence, text word count is not related to daily mood.

  Daily Mood Score
Predictors Estimates std. Beta CI standardized CI p
(Intercept) 54.96 50.02 – 59.89 <0.001
Word Count(n) -0.73 -0.03 -2.27 – 0.81 -0.09 – 0.03 0.354
Random Effects
σ2 351.60
τ00 participant_id 252.08
ICC 0.42
N participant_id 45
Observations 1041

Notice here how the standard regression line (in orange), shows that there is quite a strong, negative relationship between word count and mood, but once the model accounts for individual differences, the relationship disappears (as seen in the blue lines). This really highlights the importance of incorporating individual differences into the model. If you put your cursor over the blue lines, you can see which participant the line is related to.

Between-subjects effects

Again, any between-subjects effects would essentially mean that, does the average amount that an individual texts within a day relate to their average daily mood. Essentially, you can think of these as trait-like variables. However, as we can see here in the table and in the plot, there is no significant relationship between the two variables.

  Mean Daily Mood Score
Predictors Estimates CI p
(Intercept) 57.65 50.81 – 64.49 <0.001
Mean Word Count(n) -0.01 -0.02 – 0.01 0.267
Observations 45
R2 / R2 adjusted 0.029 / 0.006

Within-subjects effects

In order to get the within-subjects effects, we standardized all of the deviations within an individual (like we did with the mood data previously), so that we can compare across individuals. The variable that we have here is how much an individual is deviating from their average daily word count. In essence, the question we are asking is if someone texts more than they usually would, is that indicative of daily mood? As we can see here, there appears to potentially be a weake negative relationship, but it could just be noise. And the table here shows us that the effect is only marginally significant \((p=0.064)\). While it’s difficult to say, this could potentially mean that if someone texts less than they usually would, that could be related to better mood. Or, conversely, if someone texts more than they usually would, that would be predictive of worse mood. However, this effect is only marginally significant so it should be taken with a hint of caution.

  Daily Mood Score
Predictors Estimates std. Beta CI standardized CI p
(Intercept) 55.12 50.19 – 60.05 <0.001
Deviations from Average Word Count -1.10 -0.04 -2.26 – 0.06 -0.09 – 0.00 0.064
Random Effects
σ2 350.65
τ00 participant_id 252.76
ICC 0.42
N participant_id 45
Observations 1041

Sentiment Analysis

Non-aggregated model

Next, we’ll take a look at how daily sentiment is associated with mood scores. As we can see in this disaggregated model, both in the table and in the figure, daily sentiment is predictive of daily mood score. Again, as you can see in the figure, the orange line seems to be overestimating this relationship and the mixed-effects model is dampening a little bit of the effect. Now let’s see if disaggregating will tell us what is driving this relationship (i.e., between- or within-subjects effects).

  Daily Mood Score
Predictors Estimates std. Beta CI standardized CI p
(Intercept) 54.75 49.85 – 59.64 <0.001
Daily Sentiment 4.98 0.09 2.46 – 7.51 0.05 – 0.14 <0.001
Random Effects
σ2 346.90
τ00 participant_id 248.90
ICC 0.42
N participant_id 45
Observations 1041

Between-subjects effects

Hmm, it looks as though there doesn’t seem to be any between-person effects. Interesting that individuals who text more positive things, on average, don’t seem to have higher average daily mood.

  Mean Daily Mood Score
Predictors Estimates CI p
(Intercept) 54.26 48.75 – 59.76 <0.001
Mean Daily Sentiment 10.86 -17.06 – 38.79 0.437
Observations 45
R2 / R2 adjusted 0.014 / -0.009

Within-subjects effects

Okay! Now we’re getting a strong(ish) signal. It looks as though there is a within person effect here. Essentially, if an individual texts more positive things than they usually do, they are more likely to report having a higher mood rating. While this seems fairly intuitive once explained like that, it is helpful to know that we are able to detect some signal from the noise.

  Daily Mood Score
Predictors Estimates std. Beta CI standardized CI p
(Intercept) 55.11 50.19 – 60.04 <0.001
Deviations from Average Daily Sentiment 2.43 0.09 1.27 – 3.58 0.05 – 0.14 <0.001
Random Effects
σ2 345.98
τ00 participant_id 253.08
ICC 0.42
N participant_id 45
Observations 1041

First Person Pronoun (FPP) Use

Next, we wanted to look at a potentially interesting text feature, first person pronouns (FPP). In the literature, higher first person pronouns use has been associated with lower mood and even depression. However, most studies are unable to disaggregate between the within- and between-person effects. So the question becomes, is this a trait-like phenomena or is this a state-like phenomena? In order to determine this, instead of using the total FPP count within a day, FPP usage was calculated by counting up the amount of FPP (such as “I” and “me”) used within a day and dividing that by the total amount of words used that day. This allows us to look at this variable completely independent from the effect of daily word count.

Non-aggregated model

Again, we can see in both the plot and in the table that there seems to be a negative relationship between the two variables, but now we need to determine if this is a between- or within-subjects effect (or both).

  Daily Mood Score
Predictors Estimates std. Beta CI standardized CI p
(Intercept) 57.81 52.55 – 63.08 <0.001
First Person Pronoun Use -35.31 -0.07 -60.34 – -10.29 -0.12 – -0.02 0.006
Random Effects
σ2 349.33
τ00 participant_id 250.23
ICC 0.42
N participant_id 45
Observations 1041

Between-subjects effects

it appears as though there is no between-subjects effects for average FPP usage and average daily mood.

  Mean Daily Mood Score
Predictors Estimates CI p
(Intercept) 60.40 43.94 – 76.86 <0.001
Mean Daily FPP Use -70.74 -278.78 – 137.30 0.497
Observations 45
R2 / R2 adjusted 0.011 / -0.012

Within-subjects effects

And here we clearly see that there is a within-subjects effect. In essence, what this means is that, if an individual uses more first person pronouns on any given day, they are more likely to report having low mood.

  Daily Mood Score
Predictors Estimates std. Beta CI standardized CI p
(Intercept) 55.11 50.19 – 60.04 <0.001
Deviations from Average FPP Use -1.88 -0.07 -3.04 – -0.72 -0.12 – -0.03 0.001
Random Effects
σ2 348.34
τ00 participant_id 252.93
ICC 0.42
N participant_id 45
Observations 1041

Best Fitting model

Now, the models that we’ve been running have kept the variables separated from one another. Let’s bring the models together and see which ones are strongest when we compare them directly. In the figure below, we’ve taken the within-person models from the three features above and plotted them. To understand this graph, you simply have to know that the further away something is from the center line (0), the larger the effect. And if the distribution crosses the center line, it’s not considered significantly different from 0. Here we can see that Word Count is the closest to the zero line–meaning that it has the smallest effect. In addition, it just barely crosses the zero line, which indicates that the effect is not statistically significant. The next furthest away variable is the FPP devations. We can see this is on the left side of the center line, meaning there is a negative effect, and since it doesn’t cross the center line, it is significantly different from 0. In essence, this is just another way of showing us what we already knew from above, that more FPP use is negatively related to daily mood. Finally, the furthest away from the center line is the sentiment variable. This is on the right side of the center line, meaning that there is a positive relationship! Positive mood is associated with positive word usage (at least when you say more positive things than you usually do).

Now this is all great, but what if we were to put all of these variables into ONE model. Would that be able to explain the variability in daily reported mood than any one variable by itself? If that’s so, how do we know which variables to choose? Not only that, but we can also see if the interaction between these variables might be meaningful (more on that in a second). One way we can do this is by running an algorithm that swaps out all of the terms and finds the model that explains the variability in mood the best. For time’s sake, I won’t show you that process, but I will tell you that the model that best explains the data is one with sentiment, FPP usage, and the interaction between these two. So what does that look like? Well, not much different from above, but now we know that it is the model that best explains the data. Take a look!

  daily
Predictors Estimates CI p
(Intercept) 55.10 50.18 – 60.02 <0.001
fpp p zscore -1.80 -2.95 – -0.65 0.002
daily sentiment zscore 2.20 1.04 – 3.35 <0.001
fpp_p_zscore:daily_sentiment_zscore 1.16 0.05 – 2.27 0.040
Random Effects
σ2 341.52
τ00 participant_id 252.22
ICC 0.42
N participant_id 45
Observations 1041
Marginal R2 / Conditional R2 0.017 / 0.434

It’s now all the same color because these effects are all coming from the same model. Additionally, FPP and sentiment haven’t changed very much. There is a new term here, though, and that’s the one on bottom. This is the interaction between the two variables. It’s just barely not touching the zero line, so it’s not significant. But what exactly does it mean? Well, let’s pull up another figure to help explain it.

While this graph is confusing, one way we can interpret this graph is that when someone is +1 standard deviation above their average personal pronoun use, but also -2.5 standard deviations below their average sentiment, they’re more likely to be have a low reported daily mood. In other words, if you’re talking about yourself more, but using more negative words, you’re more likely to have a negative mood than if you’re talking aout yourself more but using positive words. In addition, you’re also more likely to have a better day if you’re using less first person pronouns, even if you’re using a lot of negative sentiment words.

Now this all has been great for understanding how some text features are related to daily mood. These types of models help us understand how phenomena are related–which is a useful endeavor. But what if we wanted to go beyond explaining and predict an individual’s mood.

Predictive modeling

So far, we’ve explored the data, looked at meaningful differences on positive and negative days in the text data, and even examined the relationships between certain text features and reported mood. Now, we can try and see how good we can get at predicting an individual’s daily mood score. Let’s start with the model that we left off with, the one that best explains the variablity in daily mood better than any other model so far. What we can do is use the equations from that model and say “hey, if we were to put in those same numbers (such as FPP use and sentiment), what value would the model predict?” We can then compare the predicted values against the actual values to see how we did.

I’m going to show an interactive plot below. To make visualization a bit easier, I’ve randomly selected 10 participants (as differentiated by color) from the dataset and we’ll see how accurate the predicted values are compared to the real values (fingers crossed)! Go ahead and use the slider on the bottom to toggle between predicted and actual values.

So what does this mean? It means that just because there’s a relationship between these variables and daily mood, it does not mean that on their own they’re able to do a great job of predicting daily mood scores. So while we have a better understanding of what might be related to mood, we’re still a ways off from being able to use this information to accurately predict mood.

Splitting the dataset

Now, notably, I used the same dataset to build the model as I did to predict it. Which is essentially the same as me predicting something in real life after it’s happened (which makes how poorly the model worked hurt all the more). A more appropriate way of going about this would be to build the model on one set of set of data and see how it performs on a completely separate set. One way we can do this is by simply randomly splitting our dataset into two different groups: one for training the model, and one for testing it.

However, because we have data that’s repeatedly measured across time, it wouldn’t be appropriate to randomly sample the data. In essence, we want past events predicting future ones, and not the other way around. So what we have to do with our dataset is take a certain portion of daily mood reports before a certain time, train a model on that, and test it on a model with data from after that time point. Another snag within this is that we should also be taking into account the fact that the data is grouped within a person too. So what we’re going to do is take the last 7 reported mood events for each person and put that into our testing set. Therefore, we will be training our model with everything before those data points, and testing it on the last 7 data points for each individual. This leaves us with a dataset of about 800 points for testing and 200 for training.

As you can see in the figure above, I’ve pulled out 9 random participants to show us how the data will be split. The red dots are what the model will be trained on, and the blue dots are what the model will be tested on. Let’s use the same model on the training set and then test it out on the test set (hopefully you’re not getting your hopes up). We can use this model as a benchmark to compare other models. In order to do that, we can use what’s know as the Root Mean Square Error (RMSE). This is simply a measure of how far off the predicted data points were off from the actual ones, where a higher number indicates a poorer prediction and a lower number indicates a better prediction.

It looks like the RMSE value is 19.586. Now we can use this to compare other models and see if we can get the number lower. But before we do that, let’s see if we can visualize the predicted vs actual points again for one participant.

It’s….not great. Maybe we can have better luck with another participant?

Hmm, oh well. I guess the next step now is to see if we can predict these data points with some other type of models.

Machine Learning

This is where we’ll turn to machine learning. Machine learning is notoriously bad at producing models that are interpretable, but they are surprisingly good at being able to predict outcomes. Let’s see if we can’t use some machine learning algorithms. Here we’ll test two different algorithms: a Gradient Boosting Machine (GBM) algorithm and a Support Vector Machine (SVM) algorithm. We’ll run these models and see which one has the best RMSE!

GBM Model

Essentially the Gradient Boosting Machine uses a decision tree in order to fit the model to the data. There are a few different parameters we can pass to the model, but the most important are the amount of interaction terms that can be used (max tree depth) and the number of iterations that run. Based on the graph below, we can see that the best fit is with a max tree depth of 3 and about 40 iterations. Let’s see how well it did at predicting the training set data points.

Based on the graph above, we can see that the best fit is with a max tree depth of 3 and about 40 iterations. Let’s see how well it did at predicting the training set data points.

Doesn’t seem to do all that great. And that’s reflected in the RMSE value. From this model, we were able to get an RMSE of 20.621. Interestingly, this is not better than our mixed-effects model! Let’s see if we can get under our benchmark with the SVM model.

SVM Model

While we initially got an RMSE of 19.666, we were able to fine tune the model with an algorithm that optimizes based on two parameters: cost and epsilon, which is a margin of tolerance where no penalty is given to errors (seen in the figure above). We then updated the model and the new RMSE came out to 18.084. Only slightly better than our benchmark. Let’s take a look at how well it predicted the results

Conclusion

Overall, we were able to take a look at the variability in daily mood and we were able to look at how texting patterns differ on positive vs negative days. Additionally, we were able to examine the relationship between different text features and daily mood. We ultimately found that the within-subjects effects were much more striking and meaningful than the between-subject effects. However, unfortunately, just because there are relationships between these variables, that does not mean that we are close to being able to predict mood from the data we have. We’ll need plenty more data, and potentially even other features from the text data, in addition to other streams of passive sensing data (such as motion, GPS, light sensors). Once we incorporate all of these streams of data into a larger model, it’s possible that we might be able to get some better predictive power in our hands in order to do some real good for people.

Avatar
Adam Pettitt
Graduate Research Fellow

Clinical Psychology PhD student at the University of Oregon.

comments powered by Disqus